SeqProp - Protein Sequence Properties¶
This notebook gives an overview the available calculations for properties of a single protein sequence.
Note
See ssbio.protein.sequence.seqprop.SeqProp
for a description of all the available attributes and functions.
Imports¶
In [ ]:
import sys
import logging
import os.path as op
In [ ]:
# Import the SeqProp class
from ssbio.protein.sequence.seqprop import SeqProp
In [ ]:
# Printing multiple outputs per cell
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
Logging¶
Set the logging level in logger.setLevel(logging.<LEVEL_HERE>)
to
specify how verbose you want the pipeline to be. Debug is most verbose.
CRITICAL
- Only really important messages shown
ERROR
- Major errors
WARNING
- Warnings that don’t affect running of the pipeline
INFO
(default)- Info such as the number of structures mapped per gene
DEBUG
- Really detailed information that will print out a lot of stuff
In [ ]:
# Create logger
logger = logging.getLogger()
logger.setLevel(logging.INFO) # SET YOUR LOGGING LEVEL HERE #
In [ ]:
# Other logger stuff for Jupyter notebooks
handler = logging.StreamHandler(sys.stderr)
formatter = logging.Formatter('[%(asctime)s] [%(name)s] %(levelname)s: %(message)s', datefmt="%Y-%m-%d %H:%M")
handler.setFormatter(formatter)
logger.handlers = [handler]
Initialization of the project¶
Set these two things:
PROTEIN_ID
- Your protein ID
PROTEIN_SEQ
- Your protein sequence
In [ ]:
# SET IDS HERE
PROTEIN_ID = 'YIAJ_ECOLI'
PROTEIN_SEQ = 'MGKEVMGKKENEMAQEKERPAGSQSLFRGLMLIEILSNYPNGCPLAHLSELAGLNKSTVHRLLQGLQSCGYVTTAPAAGSYRLTTKFIAVGQKALSSLNIIHIAAPHLEALNIATGETINFSSREDDHAILIYKLEPTTGMLRTRAYIGQHMPLYCSAMGKIYMAFGHPDYVKSYWESHQHEIQPLTRNTITELPAMFDELAHIRESGAAMDREENELGVSCIAVPVFDIHGRVPYAVSISLSTSRLKQVGEKNLLKPLRETAQAISNELGFTVRDDLGAIT'
In [ ]:
# Create the SeqProp object
my_seq = SeqProp(id=PROTEIN_ID, seq=PROTEIN_SEQ)
-
SeqProp.
write_fasta_file
(outfile, force_rerun=False)[source] Write a FASTA file for the protein sequence,
seq
will now load directly from this file.Parameters: - outfile (str) – Path to new FASTA file to be written to
- force_rerun (bool) – If an existing file should be overwritten
In [ ]:
# Write temporary FASTA file for property calculations that require FASTA file as input
import tempfile
ROOT_DIR = tempfile.gettempdir()
my_seq.write_fasta_file(outfile=op.join(ROOT_DIR, 'tmp.fasta'), force_rerun=True)
my_seq.sequence_path
Computing and storing protein properties¶
A SeqProp
object is simply an extension of the Biopython
SeqRecord
object. Global properties which describe or summarize the
entire protein sequence are stored in the annotations
attribute,
while local residue-specific properties are stored in the
letter_annotations
attribute.
Basic global properties¶
-
SeqProp.
get_biopython_pepstats
()[source] Run Biopython’s built in ProteinAnalysis module and store statistics in the
annotations
attribute.
In [ ]:
# Global properties using the Biopython ProteinAnalysis module
my_seq.get_biopython_pepstats()
{k:v for k,v in my_seq.annotations.items() if k.endswith('-biop')}
-
SeqProp.
get_emboss_pepstats
()[source] Run the EMBOSS pepstats program on the protein sequence.
Stores statistics in the
annotations
attribute. Saves a.pepstats
file of the results where the sequence file is located.
In [ ]:
# Global properties from the EMBOSS pepstats program
my_seq.get_emboss_pepstats()
{k:v for k,v in my_seq.annotations.items() if k.endswith('-pepstats')}
-
SeqProp.
get_aggregation_propensity
(email, password, cutoff_v=5, cutoff_n=5, run_amylmuts=False, outdir=None)[source] Run the AMYLPRED2 web server to calculate the aggregation propensity of this protein sequence, which is the number of aggregation-prone segments on the unfolded protein sequence.
Stores statistics in the
annotations
attribute, under the key aggprop-amylpred.See
ssbio.protein.sequence.properties.aggregation_propensity
for instructions and details.
In [ ]:
# Aggregation propensity - the predicted number of aggregation-prone segments on an unfolded protein sequence
my_seq.get_aggregation_propensity(outdir=ROOT_DIR, email='nmih@ucsd.edu', password='ssbiotest', cutoff_v=5, cutoff_n=5, run_amylmuts=False)
{k:v for k,v in my_seq.annotations.items() if k.endswith('-amylpred')}
-
SeqProp.
get_kinetic_folding_rate
(secstruct, at_temp=None)[source] Run the FOLD-RATE web server to calculate the kinetic folding rate given an amino acid sequence and its structural classficiation (alpha/beta/mixed)
Stores statistics in the
annotations
attribute, under the key kinetic_folding_rate_<TEMP>-foldrate.See
ssbio.protein.sequence.properties.kinetic_folding_rate.get_foldrate()
for instructions and details.
In [ ]:
# Kinetic folding rate - the predicted rate of folding for this protein sequence
secstruct_class = 'mixed'
my_seq.get_kinetic_folding_rate(secstruct=secstruct_class)
{k:v for k,v in my_seq.annotations.items() if k.endswith('-foldrate')}
-
SeqProp.
get_thermostability
(at_temp)[source] Run the thermostability calculator using either the Dill or Oobatake methods.
Stores calculated (dG, Keq) tuple in the
annotations
attribute, under the key thermostability_<TEMP>-<METHOD_USED>.See
ssbio.protein.sequence.properties.thermostability.get_dG_at_T()
for instructions and details.
In [ ]:
# Thermostability - prediction of free energy of unfolding dG from protein sequence
# Stores (dG, Keq)
my_seq.get_thermostability(at_temp=32.0)
my_seq.get_thermostability(at_temp=37.0)
my_seq.get_thermostability(at_temp=42.0)
{k:v for k,v in my_seq.annotations.items() if k.startswith('thermostability_')}